<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Chunkr - 智能文档处理工具</title>
    <link href="https://cdn.staticfile.org/font-awesome/6.4.0/css/all.min.css" rel="stylesheet">
    <link href="https://cdn.staticfile.org/tailwindcss/2.2.19/tailwind.min.css" rel="stylesheet">
    <link href="https://fonts.googleapis.com/css2?family=Noto+Serif+SC:wght@400;500;600;700&family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet">
    <script src="https://cdn.jsdelivr.net/npm/mermaid@latest/dist/mermaid.min.js"></script>
    <style>
        body {
            font-family: 'Noto Sans SC', Tahoma, Arial, Roboto, "Droid Sans", "Helvetica Neue", "Droid Sans Fallback", "Heiti SC", "Hiragino Sans GB", Simsun, sans-serif;
            background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
            min-height: 100vh;
        }
        .hero-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
        }
        .card-hover {
            transition: all 0.3s ease;
        }
        .card-hover:hover {
            transform: translateY(-5px);
            box-shadow: 0 20px 40px rgba(0,0,0,0.1);
        }
        .text-gradient {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }
        .feature-icon {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            -webkit-background-clip: text;
            -webkit-text-fill-color: transparent;
        }
        .code-block {
            background: #1e1e1e;
            border-radius: 12px;
            overflow: hidden;
        }
        .drop-cap {
            float: left;
            font-size: 4rem;
            line-height: 1;
            font-weight: 700;
            margin-right: 0.5rem;
            color: #667eea;
            font-family: 'Noto Serif SC', serif;
        }
        .section-divider {
            height: 2px;
            background: linear-gradient(to right, transparent, #667eea, transparent);
            margin: 4rem 0;
        }
        .mermaid {
            display: flex;
            justify-content: center;
            margin: 2rem 0;
        }
        .step-number {
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            width: 40px;
            height: 40px;
            border-radius: 50%;
            display: flex;
            align-items: center;
            justify-content: center;
            font-weight: bold;
            margin-right: 1rem;
            flex-shrink: 0;
        }
        .highlight-box {
            background: linear-gradient(135deg, rgba(102, 126, 234, 0.1) 0%, rgba(118, 75, 162, 0.1) 100%);
            border-left: 4px solid #667eea;
            padding: 1.5rem;
            border-radius: 8px;
            margin: 2rem 0;
        }
        @keyframes fadeInUp {
            from {
                opacity: 0;
                transform: translateY(30px);
            }
            to {
                opacity: 1;
                transform: translateY(0);
            }
        }
        .animate-fadeInUp {
            animation: fadeInUp 0.8s ease-out;
        }
    </style>
</head>
<body>
    <!-- Hero Section -->
    <section class="hero-gradient text-white py-20 px-6">
        <div class="max-w-6xl mx-auto text-center animate-fadeInUp">
            <h1 class="text-5xl md:text-7xl font-bold mb-6">
                <i class="fas fa-file-alt mr-4"></i>Chunkr
            </h1>
            <p class="text-2xl md:text-3xl mb-8 font-light">智能文档处理，让AI理解你的每一份文件</p>
            <p class="text-lg md:text-xl max-w-3xl mx-auto opacity-90">
                开源文档智能处理工具，专为将复杂文档转化为RAG和LLM可用的结构化数据而设计
            </p>
            <div class="mt-10 flex flex-wrap justify-center gap-4">
                <a href="https://github.com/lumina-ai-inc/chunkr" class="bg-white text-purple-700 px-8 py-3 rounded-full font-semibold hover:bg-gray-100 transition">
                    <i class="fab fa-github mr-2"></i>GitHub 仓库
                </a>
                <a href="https://chunkr.ai" class="border-2 border-white text-white px-8 py-3 rounded-full font-semibold hover:bg-white hover:text-purple-700 transition">
                    <i class="fas fa-rocket mr-2"></i>开始使用
                </a>
            </div>
        </div>
    </section>

    <!-- Main Content -->
    <main class="max-w-6xl mx-auto px-6 py-16">
        <!-- Problem Section -->
        <section class="mb-16 animate-fadeInUp">
            <h2 class="text-4xl font-bold mb-8 text-gray-800">
                <i class="fas fa-puzzle-piece mr-3 feature-icon"></i>它能解决什么问题
            </h2>
            <div class="prose prose-lg max-w-none">
                <p class="text-gray-700 leading-relaxed mb-6">
                    <span class="drop-cap">开</span>发者在构建RAG或知识库系统时，常遇到令人头疼的文档处理难题。复杂文档如PDF、表格难以提取结构化信息，OCR精度低或格式混乱让人抓狂。现有工具缺乏灵活性，无法满足特定场景需求。而自建文档处理管道又耗时耗力，涉及布局分析、语义分块等复杂工程问题。
                </p>
                <div class="grid md:grid-cols-3 gap-6 mt-8">
                    <div class="card-hover bg-white p-6 rounded-xl shadow-lg">
                        <i class="fas fa-file-pdf text-4xl mb-4 feature-icon"></i>
                        <h3 class="text-xl font-semibold mb-2">复杂文档解析</h3>
                        <p class="text-gray-600">PDF、表格等复杂格式难以提取结构化信息</p>
                    </div>
                    <div class="card-hover bg-white p-6 rounded-xl shadow-lg">
                        <i class="fas fa-cogs text-4xl mb-4 feature-icon"></i>
                        <h3 class="text-xl font-semibold mb-2">工具灵活性差</h3>
                        <p class="text-gray-600">现有方案无法满足特定场景的定制需求</p>
                    </div>
                    <div class="card-hover bg-white p-6 rounded-xl shadow-lg">
                        <i class="fas fa-clock text-4xl mb-4 feature-icon"></i>
                        <h3 class="text-xl font-semibold mb-2">开发成本高</h3>
                        <p class="text-gray-600">自建管道涉及复杂工程，耗时耗力</p>
                    </div>
                </div>
            </div>
        </section>

        <div class="section-divider"></div>

        <!-- Core Features -->
        <section class="mb-16 animate-fadeInUp">
            <h2 class="text-4xl font-bold mb-8 text-gray-800">
                <i class="fas fa-star mr-3 feature-icon"></i>核心功能概述
            </h2>
            <div class="grid md:grid-cols-2 gap-8">
                <div class="bg-white p-8 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <i class="fas fa-eye text-3xl mr-4 feature-icon"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">高精度OCR与布局分析</h3>
                            <p class="text-gray-600">识别文档中的文本、表格、公式等，生成带边界框的结构化数据，精准还原文档结构</p>
                        </div>
                    </div>
                </div>
                <div class="bg-white p-8 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <i class="fas fa-cut text-3xl mr-4 feature-icon"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">语义分块</h3>
                            <p class="text-gray-600">将文档分解为语义完整的片段，优化RAG和LLM输入，提升AI理解能力</p>
                        </div>
                    </div>
                </div>
                <div class="bg-white p-8 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <i class="fas fa-file-export text-3xl mr-4 feature-icon"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">多格式输出</h3>
                            <p class="text-gray-600">支持HTML、Markdown、JSON和纯文本输出，轻松适配各种下游任务需求</p>
                        </div>
                    </div>
                </div>
                <div class="bg-white p-8 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <i class="fas fa-brain text-3xl mr-4 feature-icon"></i>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">视觉模型支持</h3>
                            <p class="text-gray-600">集成VLM处理复杂元素如图表和手写内容，让AI"看懂"你的文档</p>
                        </div>
                    </div>
                </div>
            </div>
            
            <div class="highlight-box mt-8">
                <p class="text-lg font-medium text-gray-800">
                    <i class="fas fa-lightbulb mr-2 text-yellow-500"></i>
                    思考：你的项目是否需要处理大量PDF或表格数据？Chunkr的语义分块能否提升你的RAG系统性能？
                </p>
            </div>
        </section>

        <div class="section-divider"></div>

        <!-- Architecture Diagram -->
        <section class="mb-16 animate-fadeInUp">
            <h2 class="text-4xl font-bold mb-8 text-gray-800">
                <i class="fas fa-sitemap mr-3 feature-icon"></i>系统架构
            </h2>
            <div class="bg-white p-8 rounded-xl shadow-lg">
                <div class="mermaid">
                    graph TB
                        A[文档输入] -->|PDF/PPT/Word/图片| B[Chunkr处理引擎]
                        B --> C[OCR识别]
                        B --> D[布局分析]
                        B --> E[语义分块]
                        C --> F[结构化数据]
                        D --> F
                        E --> F
                        F --> G[多格式输出]
                        G --> H[HTML]
                        G --> I[Markdown]
                        G --> J[JSON]
                        G --> K[纯文本]
                        H --> L[RAG系统]
                        I --> L
                        J --> L
                        K --> L
                        L --> M[AI应用]
                        
                        style A fill:#f9f,stroke:#333,stroke-width:2px
                        style B fill:#bbf,stroke:#333,stroke-width:2px
                        style M fill:#bfb,stroke:#333,stroke-width:2px
                </div>
            </div>
        </section>

        <div class="section-divider"></div>

        <!-- Use Cases -->
        <section class="mb-16 animate-fadeInUp">
            <h2 class="text-4xl font-bold mb-8 text-gray-800">
                <i class="fas fa-briefcase mr-3 feature-icon"></i>使用场景
            </h2>
            <div class="space-y-6">
                <div class="bg-white p-6 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <div class="step-number">1</div>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">学术研究数据提取</h3>
                            <p class="text-gray-600">研究人员处理大量学术PDF，Chunkr可提取标题、表格和引用，生成结构化Markdown，加速文献分析</p>
                        </div>
                    </div>
                </div>
                <div class="bg-white p-6 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <div class="step-number">2</div>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">企业知识库构建</h3>
                            <p class="text-gray-600">企业将合同、报告等Word文档转为JSON格式，Chunkr的语义分块帮助构建高效检索系统</p>
                        </div>
                    </div>
                </div>
                <div class="bg-white p-6 rounded-xl shadow-lg card-hover">
                    <div class="flex items-start">
                        <div class="step-number">3</div>
                        <div>
                            <h3 class="text-2xl font-semibold mb-3">财务数据处理</h3>
                            <p class="text-gray-600">金融团队解析Excel或PDF中的复杂表格，Chunkr提供高精度